首页> 外文OA文献 >Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
【2h】

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

机译:Halide:用于优化图像处理流水线中的并行性,局部性和重新计算的语言和编译器

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.
机译:图像处理管道结合了模版计算和流程序的挑战。它们由模具不同阶段的大型图形,复杂的缩小以及具有全局或数据相关访问模式的阶段组成。由于其复杂的结构,纯朴的管道实现与优化的管道之间的性能差异通常是一个数量级。有效的实现需要同时优化并行性和局部性,但是由于模板的性质,并行性,局部性和引入共享值的冗余计算之间存在根本的张力。我们提供了模板管线基础的权衡空间的系统模型,计划表表示,它描述了图像处理管道中每个阶段在该空间中的具体点,以及用于Halide图像处理语言的优化编译器,该编译器从Halide算法和计划表中综合了高性能实现。将此编译器与时间表空间上的随机搜索相结合,可使简洁的可组合程序在各种实际图像处理管道上以及跨不同硬件体系结构(包括具有SIMD的多核和异构CPU)上实现最先进的性能+ GPU执行。通过几个小时内编写的简单Halide程序,我们证明了性能比专家在几周或几个月内优化的C,内在函数和CUDA实现快5倍,而这些图像处理应用是过去自动编译器无法企及的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号